In March 2012, I discovered the Codea app, allowing coding on an iPad, and the Lua programming language, its foundation. That was the unlikely seed for discovering the functional programming language Haskell in late 2013: a Codea discussion mentioned Project Euler and Haskell solutions to its problems. In particular, a one-line solution for Problem 9: the Pythagorean triplet which sums to 1,000.
1 |
[(a,b,c) | a <- [1..500], b <- [1..a-1], c <- [1000-a-b], c < 500, a^2 + b^2 == c^2] |
I was attracted by Haskell’s apparently efficient, mathematical syntax. Later, I realised that there were parallels with Excel, the functional ‘programming language’ that I use every day (informed by the paper Improving the world’s most popular functional language: user-defined functions in Excel by Simon Peyton Jones, Margaret Burnett and Alan Blackwell).
Syntax highlighting
I wanted a (free) syntax highlighter plugin for WordPress version 4.9.4 that would work well with Haskell code. Ultimately, I settled on Crayon Syntax Highlighter by Aram Kocharyan.
I ruled out Code Prettify by Kaspars Dambis because it is based on the Google code-prettify library and I understood that library did not handle Haskell well.
1 2 3 4 5 6 7 8 9 10 |
module Main where main :: IO () main = do let ps = [(a, b, c) | a <- [1 .. 500] , b <- [1 .. a - 1] , c <- [1000 - a - b] , c < 500 , a^2 + b^2 == c^2] print ps |
Configuring the theme
My favourite code editor is Visual Studio (VS) Code and I use Justin Adam’s Haskell Syntax Highlighting extension with the default dark theme. I wanted the Crayon plugin to highlight Haskell code in the same way as the extension, to the extent that was possible. Syntax highlighting is the product of a language grammar, which names scopes, and a theme, which associates styles with names. VS Code uses TextMate language grammar and themes. Crayon’s approach is simpler.
The first step was to analyse the VS Code language grammar for Haskell, found in the extension’s file haskell.tmLanguage
. This is in the form of an XML-format property list, and I used the plist
package to identify the names that it contained. The plist
package makes use of the hxt
package (the Haskell XML Toolbox).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
module Main where import Data.List (foldl', intercalate, nub, sort) import Text.XML.HXT.Arrow.XmlState.SystemConfig (no, withSubstDTDEntities, withValidate) import Text.XML.Plist (PlObject (PlArray, PlDict, PlString), readPlistFromFile) import Paths_processTM (getDataFileName) main :: IO () main = do uri <- getDataFileUri "haskell.tmLanguage" pl <- readPlistFromFile [ withValidate no , withSubstDTDEntities no ] uri let scopes = (intercalate "\n" . sort . nub . names []) pl writeFile "scopes.txt" scopes names :: [String] -> PlObject -> [String] names ns p | PlArray ps <- p = foldl' names ns ps | PlDict ds <- p = foldl' names' ns ds | otherwise = ns names' :: [String] -> (String, PlObject) -> [String] names' ns ("name", PlString s) = s : ns names' ns (_, p) = names ns p getDataFileUri :: String -> IO String getDataFileUri dataFile = do fileName <- getDataFileName dataFile return $ uriFromPath fileName -- |This function assumes that the fileName is an fully qualified one beginning -- with a disk designator uriFromPath :: String -> String uriFromPath fileName = "file:///" ++ normalise fileName -- |Windows uses both \ and / as folder separators. This function replaces any -- \ with /. normalise :: String -> String normalise = map repl where repl '\\' = '/' repl c = c |
Initially, hxt
could not parse the haskell.tmLanguage
file. The XML specification forbids <
characters inside elements, but the file had three instances that were not converted into <
entity references. It appears that VS Code is tolerant of some mis-specified XML files.
The default dark colour scheme used by VS Code is established in .json
files dark_defaults
, dark_vs
and dark_plus
. The following scheme colours are used with the Haskell Syntax Highlighting extension:
Colour | Code | Use | Plugin element |
---|---|---|---|
Black | #1E1E1E | Background | Not applicable |
Green | #608B4E | Comments | COMMENT |
Brown | #CE9178 | String literals, character literals | STRING |
Tan | #D7BA7D | Escaped character literals | Not replicated |
Light green | #B5CEA8 | Numerical literals | CONSTANT |
Pink | #C586C0 | Control flow keywords (do , mdo , if , then , else , case and of ) |
STATEMENT |
Blue | #569CD6 | Compiler pragmas, other keywords, types, :: , -> |
PREPROCESSOR, RESERVED, TYPE |
Light blue | #9CDCFE | Type variables in data or newtype declarations. |
Not replicated |
Blue green | #4EC9B0 | Classes in deriving declarations. |
Not replicated |
Yellow | #DCDCAA | Variable names in type signatures, exports and imports. | ENTITY |
White | #D4D4D4 | Default text | IDENTIFIER, OPERATOR, SYMBOL |
Configuring the language grammar
The Crayon plugin’s grammar files for a user-defined language are located in its subfolder of wp-content/uploads/crayon-syntax-highlighter/langs
. If a user-defined language has the same folder name as a built-in language folder, the user-defined one takes precedence.
The plugin’s language grammar associates a regular expression (regex) with a list of unique elements, user-defined or built-in to the plugin theme. A user-defined element can be associated with a built-in one. The regex language is PHP’s Perl-compatible regex (PCRE).
The grammar specification includes (undocumented) ‘modes’ CASE_INSENSITIVE
, MULTI_LINE
and SINGLE_LINE
, which are all set by default. The modes can be set (ON
, YES
or 1
) or unset (OFF
, NO
or 0
) with lines such as the following (at least one space is required between the mode name and the =
):
1 |
CASE_INSENSITIVE =OFF |
The grammar for Haskell provided with Crayon was as follows (for those elements which reference the default language grammar, I have added the default as a following comment):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
NAME Haskell VERSION 1.9.8 COMMENT (--.*?$)|({-[^\}]*}) STRING (?default) # STRING ((?<![^\\]\\)".*?(?<![^\\]\\)")|((?<![^\\]\\)'.*?(?<![^\\]\\)') QUALIFIER:VARIABLE (?<=import)\s+[^\s]+ RESERVED \b(?alt:reserved.txt)\b TYPE \b(?alt:type.txt)\b RECORD:VARIABLE \b\w+\b\s*(?=::)(?=[^{]*}) ENTITY \b\w+\b\s*(?=::) ARG:VARIABLE (\b[\w\t ]+\b(?=\s*->))|((?<=->)\s*\b[\w\t ]+\b\s*$) CAPS:VARIABLE (?-i)\b[A-Z]\w+\b(?i) IDENTIFIER (?default) # IDENTIFIER \b[A-Za-z_]\w*\b CONSTANT (?default) # CONSTANT (?<!\w)[0-9][\.\w]* OPERATOR (?default) # OPERATOR (?alt:operator.txt) SYMBOL (?default) # SYMBOL &[^;]+;|(?alt:symbol.txt) |
reserved.txt
listed keywords but also certain functions from Haskell’s Prelude
module. type.txt
listed certain types from the Prelude. The default list of operators in operator.txt
was =&
, <<<
, >>>
, <<
, >>
, <<=
, =>>
, !==
, !=
, ^=
, *=
, &=
, %=
, |=
, /=
, +=
, -=
, ===
, ==
, <>
, ->
, <=
, >=
, ++
, --
, &&
, ||
, ::
, #
(escaped), +
, -
, *
, /
, %
, =
, &
, |
, ^
, ~
, !
, <
, >
and :
. The SYMBOL
element matched either HTML character entities or XML entity references (not relevant to Haskell) or the default list of characters in symbol.txt
(equivalent to the character class [~`!@#$%()_{}[\]|\\:;,.?]
), which was partly duplicative of the default operators.
I considered this supplied grammar to be lacking in certain respects, so I replaced it with the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CASE_INSENSITIVE = OFF PRAGMA:PREPROCESSOR \{-#.*?#-} COMMENT \{-.*?-}|---*([^!#$%&*+./<=>?@\\^|\-~:].*?)?$ PREPROCESSOR ^\s*#.*?$ PACKAGE:IDENTIFIER (?<=^import\ssafe\squalified\s|^import\squalified\s|^import\ssafe\s|^import\s)\s*("[\d\-A-Za-z]+(\-\d+(\.\d+)*)?")?\s*[A-Z]['\w]*(\.[A-Z]['\w]*)* STRING (?<!['\w])'(\\['"]|\S+?)'|".*?(?<!\\)(\\\\)*" RESERVED (?<!['\w])(?alt:reserved.txt)(?!['#\w]) STATEMENT (?<!['\w])(?alt:statement.txt)(?!['#\w]) MODULE:IDENTIFIER (?<=^module\s|\sas\s)\s*[A-Z]['\w]*(\.[A-Z]['\w]*)* RESERVED_OP:TYPE (?<![!#$%&*+./<=>?@\\^|\-~:])(\((,+|->)?\)|\[]|(?<=\()\.\.(?=\))|::|->|=>)(?![!#$%&*+./<=>?@\\^|\-~:]) CONSTRUCTOR:TYPE (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[A-Z]['\w]*#?`|([A-Z]['\w]*\.)*:[!#$%&*+./<=>?@\\^|\-~:]+) OPERATOR (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[a-z]['\w]*#?`|([A-Z]['\w]*\.)*[!#$%&*+./<=>?@\\^|\-~:]+) ENTITY ^[a-z]['\w]*#?\s*(?=::[^!#$%&*+./<=>?@\\^|\-~:]) IDENTIFIER (?<!['\w])([A-Z]['\w]*\.)*[a-z]['\w]*#? TYPE (?<!['\w])([A-Z]['\w]*\.)*[A-Z]['\w]*#? CONSTANT (?<!['\w])(\d+(\.\d+([Ee][+\-]?\d+)?|[Ee][+\-]?\d+)|0([bB][01]+|[oO][0-7]+|[xX][\dA-Fa-f]+(\.[\dA-Fa-f]+([Pp][+\-]?\d+)?|[Pp][+\-]?\d+)?)|\d+)#{0,2} SYMBOL [\(),;[\]{}_] |
reserved.txt
lists only keywords (other than those in statement.txt
). I did not want to treat functions in the Prelude differently from other functions. I used new file statement.txt
to list the control flow keywords (including \case
) and the STATEMENT
element to give them a distinct colour, so that:
1 2 3 4 5 6 7 8 9 10 |
{-# LANGUAGE LambdaCase #-} x :: Bool -> Double -> Double x test value = if test then value else 0.0 y :: Int -> String y = \case 0 -> "zero" 1 -> "one" _ -> "lots" |
became:
1 2 3 4 5 6 7 8 9 10 |
{-# LANGUAGE LambdaCase #-} x :: Bool -> Double -> Double x test value = if test then value else 0.0 y :: Int -> String y = \case 0 -> "zero" 1 -> "one" _ -> "lots" |
I distinguished Haskell compiler pragmas from other nested comments, so that:
1 2 |
{-# LANGUAGE Haskell2010 #-} {- This is a comment -} |
became:
1 2 |
{-# LANGUAGE Haskell2010 #-} {- This is a comment -} |
The ability of regex to detect context is very limited, so it is not possible to nest ‘nested’ comments, but I wanted the detection of the end of a nested comment to be more accurate, so that:
1 2 3 4 5 |
{- Comment out unwanted code data Point = Point {x :: Int, y :: Int} origin :: Point origin = Point 0 0 -} |
became:
1 2 3 4 5 |
{- Comment out unwanted code data Point = Point {x :: Int, y :: Int} origin :: Point origin = Point 0 0 -} |
I wanted (non-nested) comments to be more accurate, so that:
1 |
x --> y -- This is a comment |
became:
1 |
x --> y -- This is a comment |
The supplied grammar did not recognise that Haskell identifiers can include '
, a non-word character in PCRE. This meant taking a different approach to character literals, so that:
1 2 3 |
foldl' :: (b -> a -> b) -> b -> t a -> b foldl' f z0 xs = foldr f' id xs z0 where f' x k z = k $! f z x |
became:
1 2 3 |
foldl' :: (b -> a -> b) -> b -> t a -> b foldl' f z0 xs = foldr f' id xs z0 where f' x k z = k $! f z x |
The limited ability to detect context meant that I was unable to implement a different colour for escaped character literals. Regex’s lookbehind ((?<= )
or (?<! )
) and lookahead ((?= )
or (?! )
facilities provide a basis for detecting some contexts but the lookbehind facility in PCRE requires the look back to be a known number of characters.
Certain keywords are, or can be, followed by the name of a module (module
, import
, safe
, qualified
and as
). I used regex’s lookbehind to identify such keywords, before matching the module name. In the case of imports, the name of the module may be qualified by the package name. So:
1 2 3 4 5 6 7 8 9 10 |
{-# LANGUAGE PackageImports, Safe #-} module Main (main) where import qualified "text-1.2.3.0" Data.Text as T (pack) import "text-1.2.3.0" Data.Text (Text) import safe Data.Time.Calendar (DayOfWeek (..)) days :: [Int] days = [1..7] |
became:
1 2 3 4 5 6 7 8 9 10 |
{-# LANGUAGE PackageImports, Safe #-} module Main (main) where import qualified "text-1.2.3.0" Data.Text as T (pack) import "text-1.2.3.0" Data.Text (Text) import safe Data.Time.Calendar (DayOfWeek (..)) days :: [Int] days = [1..7] |
I did not want to treat types in the Prelude differently from other types. I did not use type.txt
but defined a regex for the form of a type. The context-detection limitation meant that data constructors and classes are formatted in the same way as types and I was unable to implement a distinct colour for type variables (formatted like other variables) or classes (formatted like types). For the same reason, ->
is formatted in case ... of
statements the same way as in type signatures.
The ENTITY
element is used for the variable in a type signature. Such a variable is the first thing on a line.
I wanted to treat qualified identifiers the same way as others, so that:
1 |
x = Data.List.head xs |
became:
1 |
x = Data.List.head xs |
I wanted to implement GHC’s magic hash, so that:
1 2 3 4 5 6 |
{-# LANGUAGE MagicHash #-} import GHC.Prim squareplus4# :: Int# -> Int# squareplus4# x = x * x + 4# + 0xFF# |
became:
1 2 3 4 5 6 |
{-# LANGUAGE MagicHash #-} import GHC.Prim squareplus# :: Int# -> Int# squareplus# x = x * x + 4# + 0xFF# |
Rather than use the plugin’s default lists of operators and symbols, I defined regex expressions which reflected the Haskell 2010 Language Report, including infix operators and data constructors, so that:
1 2 3 4 5 6 7 |
data Complex a = a :+ a z :: Complex Double z = 1.0 :+ 2.0 x :: Int x = 100 `mod` 13 |
became:
1 2 3 4 5 6 7 |
data Complex a = a :+ a z :: Complex Double z = 1.0 :+ 2.0 x :: Int x = 100 `mod` 13 |
I wanted the CONSTANT
element to match numeric literals more precisely, including hexidecimal floating point literals, so that:
1 2 3 4 5 6 7 8 9 |
{-# LANGUAGE HexFloatLiterals #-} module Main where nums :: [Double] nums = { 1e-3 , 0b0101 , 0o7777 , 0xF.Fp-4 } |
In respect of the regex for the SYMBOL
element, when the Crayon plugin processes a regex, it replaces all (
not followed by a ?
with (?:
except for escaped (
(strictly, all \(
sequences). As a consequence, it is necessary to escape (
in regex character classes.